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ABSTRACT 


Many  biologic  processes  give  responses  that  decrease  rapidly  over  time  to  some 
asymptote  C  ^  0.  The  mathematical  expressions  that  describe  the  various  phenomena 
vary  in  complexity  and  form.  To  fit  a  curve  to  data  in  any  such  situation,  one  must 
consider  both  the  formula  for  the  trend  and  the  nature  of  the  deviations  or  error 
terms.  A  discussion  of  these  problems  is  given  to  indicate  to  the  data  analyst  the 
possible  choices  that  are  his,  relative  to  assumptions  about  error  terms  and  relative 
to  technics  or  methods  of  estimation.  No  new  analytic  procedures  are  given. 

It  is  noted  that,  several  methods  of  estimation  are  being  studied  for  certain  assumed 
model  equations.  The  results  of  these  random  sampling  experiments  will  be  reported 
in  subsequent  papers. 
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SOME  PROBLEMS  ON  THE  USE  OF  NEGATIVE  EXPONENTIAL  CURVES  IN 


1.  INTRODUCTION 

Many  biologic  processes  give  responses  that 
decrease  rapidly  over  time  to  some  asymptote 
C  ^  0.  The  mathematical  expressions  that 
describe  the  various  phenomena  vary  in  com¬ 
plexity  and  form.  To  fit  a  curve  to  data  in  any 
such  situation,  one  must  consider  both  the 
formula  for  the  trend  and  the  nature  of  the 
deviations  or  error  terms.  The  intent  of  the 
discussion  that  follows  will  be  to  help  indicate 
to  the  data  analyst  the  possible  choices  that 
are  his,  relative  to  assumptions  about  error 
terms  and  relative  to  technics  or  methods  of 
estimation.  No  new  analytic  procedures  are 
proposed.  First,  there  will  be  a  consideration 
of  equation  motivation.  This  will  be  followed 
by  a  discussion-  of  the  error  terms  in  the  model 
equations.  Possible  extensions  of  certain 
models  will  be  noted,  followed  by  a  discussion 
of  various  estimation  methods.  Hypothesis 
testing  is  briefly  discussed,  with  a  final  section 
devoted  to  concluding  remarks. 

2.  MOTIVATION  OF  THE  MODEL 
EQUATIONS 

For  a  continuous  process  such  as  an 
organism’s  blood-sugar  level,  a  scientist  might 
simply  observe  that  following  the  ingestion  of 
a  meal,  there  is  an  increase  and  a  subsequent 
decrease  of  the  level  of  the  sugar  in  the  blood. 
It  might  be  further  observed  that  the  shape 
of  the  declining  curve  from  the  highest  point 
attained  appears  to  be  of  a  negative  exponential 
form,  a  exp  { -  -  ii t)  By  trying  various 

transformations,  a  straight  line  might  be 
obtained  when  the  logarithm  of  the  response 
is  plotted  against  time,  or  the  ratio  of  the 
observation  at  time  t  to  that  at  time  t  1 
might  give  a  constant  for  all  values  of  t.  This 
is  empirical  curve  fitting  or  modeling.  No 


rationale  is  presented  for  the  choice  of  the  equa¬ 
tion.  Such  empirical  curve  fitting  may  in  turn 
suggest  mechanisms  and  thus  lead  to  better 
understanding,  of  course. ' 

In  opposition  to  empirical  curve  fitting  is 
the  mechanistic  approach.  From  first  prin¬ 
ciples,  the  scientist  tries  to  predict  the  equation 
or  the  form  of  the  response  as  a  function  of 
time.  For  a  simplified  example,  consider  the 
rate  of  the  nitrogen  washout  of  the  lung  for 
an  animal  breathing  room  air  and  placed  on 
pure  oxygen.  The  room  air  concentration  of 
nitrogen  is  about  79%.  Respiration  depth  is 
kept  at  a  constant  level;  rate  is  allowed  to 
vary.  The  nitrogen  concentration  of  each  ex¬ 
pired  volume  of  air  is  determined.  If  the  total 
resting  lung  volume  is  denoted  by  v  and  the 
volume  of  inspired  air  at  each  breath  by  Av, 
then  initially,  or  at  time  zero, 

FfMO)  =  concentration  of  nitrogen  =  .79 

Fo.fO)  =  concentration  of  oxygen  =  .21 

VNo(O)  =  volume  of  nitrogen  =  ,79v 

Vo^ ( 0 )  =  volume  of  oxygen  =  .21v 

and  after  the  first  breath  of  pure  oxygen, 

Vno(0) 

Fn..  ( 1 )  = - 

VNjjfO)  +  Vo.  (0)  +  A  v 

.79v 

•  70v  -f  (.21v  +  Av) 
v 

—  .79 -  —  .79w, 

v  +  Av 

,21v  -f-  Av  .21  v  4-  Av 

FOj,  ( 1 )  = - =  - 

.79v  4-  .21  v  4-  Av  v  4-  Av 

V.s.(l)  =  Fn._,(1)v  =  .79-jv 

.2 1  v  4-  Av 

Vo.  ( 1 )  —  Fa,(l)v  =  —  — - (v) 

v  -u  Av 

v  —  .79  uv  , 

where 

V 

v  U-  \V 
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1  he  concentration  after  the  second  breath 
will  be 


Vn,_.(1)  -j-  Vo-jH)  4-  Av  v  f  Av 


Continuing  in  the  same  manner,  it  can  be  shown 
that 

FNaU)  =  .79»*  ; 

oi  is  called  the  dilution  ratio.  Thus,  the  con¬ 
centration  at  breath  t  is  found  by  multiplying 
the  concentration  at  breath  t  —  1  by  the  dilu¬ 
tion  ratio  o),  or  FN2(t)  —  wFn2  (t  — 1).  The 
equation  is  generated  from  mechanical  prin¬ 
ciples  that  are  peculiar  to  this  biologic  process. 

Parenthetically,  it  should  be  observed  that 
for  many  applications,  the  implication  is  that 
observations  are  made  at  discrete  and  usually 
equidistant  time  points.  For  some  situations, 
the  response  cannot  be  measured  and  may  pos¬ 
sibly  not  .even  be  defined  at  nonintegral  values 
of  t.  For  others,  the  process,  may  be  truly  con¬ 
tinuous  but  with  sampling  performed-  at  fixed 
time  intervals.  In  still  other  situations,  the 
independent  variable  may  not  be  chronologic 
time,  but  only  time  related,  Thus,  in  respira¬ 
tion  studies,  the  concentration  of  a  gas  in  each 
volume  of  expired  air  may  be  determined.  If 
respiration  rate  is  not  fixed,  the  actual  time 
between  breaths  is  not  fixed.  In  this  case,  it 
may  be  desirable  to  view  breath  number  as  the 
time  me.tameter. 

More  complicated  equations,  such  as  the 
sum  of  two  or  more  negative  exponential  terms, 
may  be  found  in  similar  ways:  by  trial  and 
error  and  past  experience,  a  curve,  may  be  fitted 
empirically;  or  from  a  knowledge  of  the 
mechanisms  involved,  an  equation  may  be 
derived  to  describe  the  response.  In  both  ap¬ 
proaches,  there  is  probably  a  tendency  toward 
parsimony — i.e„  one  attempts  to  explain  or  fit 
the  data  with  the  simplest  expression.  And 
in  most  curve-fitting  situations  the  distinction 
between  arbitrary  curve  fitting  and  mechanistic 
modeling  is  not  as  clear  cut  as  indicated  above. 
There  is  very  likely  a  bit  of  empiricism  as  well 
as  theory  generation  in  nearly  every  curve- 
fit!  mg  situation. 


Not  only  might  the  equation  to  be  fitted 
be  arrived  at  in  many  different  ways,  but 
different-appearing  equations  are  used  to  de¬ 
scribe  the  same  process.  That  is,  the  same 
process  may  be  represented  by  different 
equations.  For  example,  in  the  nitrogen  wash¬ 
out  example  above,  the  equation  FN2(t)  — 
«FN*(t  —  1)  led  to  FN2(t)  =  ,79<u'.  In  general, 
an  m"‘  order  difference  equation  may  be 
thought  of  as  the  model  equation.  Yet,  its 
solution  will  be  the  sum  of  m  exponential  terms. 
Which  equation  one  uses  to  fit  to  the  data  will 
depend  largely  on  the  assumptions  about  the 
error  terms,  a  consideration  of  which  will  be 
the  subject  of  the  next  section, 

That  a  given  process  can  be  described  in 
different  ways  is. rather  obvious;  the  point  is 
that  since  biologists  present  equations  in  many 
forms,  the  data  analyst  should  be  able  to 
recognize  the  “best”  equation  to  use  in  curve 
fitting,  regardless  of  how  the  process  is  de¬ 
scribed. 

3.  THE  NATURE  OF  THE  ERROR  TERM 

When  a  random  error  term  is  added  to  the 
mathematical  equations,  they  are  theh  referred 
to  as  stochastic  or  model  equations.  It  is  at 
this  point  that  the  curve  fitter  needs  to  take 
note,  for  there  are  many  assumptions  which 
can  be  made  about  the  error  terms.  Consider 
the  model  equation 

y.  -  f  e 

.  =  «ei»l  +  t,  (1) 

=  a««  +  ,,  , 

where  y,  is  the  observation  at  time  t;  a  is 
the  true  time-zero  or  initial  value;  (i ,  the  rate 
constant;  «  ~  exp  (-0);  and  e, ,  the  error 
term, 

Consideration  will  first  be  given  to  the 
possible  components  of  .  It  may  consist  of 
a  combination  of  several  sources  or  kinds  of 
“error”;  e.g,,  model  error,  measurement  error, 
and  a  kind  of  replication-by-time  interaction 
error.  For  brevity,  and  for  reasons  which 
will  be  given  later,  the  last  kind  of  error  will 
be  referred  to  as  process-control  error. 


A  model  error  may  derive  from  an  improper 
choice  of  a  mathematical  expression,  even 
though  the  fit  to  the  data  using  the  incorrect 
equation  may  be  considered  adequate.  The 
model  error  will  most  likely  constitute  a  bias, 
an  overfitting  or  underfitting  in  various  re¬ 
gions  of  the  curve.  Its  major  effect  will  be  to 
increase  the  mean  square  deviation  about  the 
fitted  line.  It  is  assumed  in  what  follows  that 
there  is  no  model  error  in  et . 

The  measurement  error,  referred  to  herein 
as  8, ,  may  be  viewed  in  the  usual  way:  it  is  the 
technic  error  introduced  into  the  observation 
when  the  measurement  is  being  obtained.  It 
may  involve  only  sensing  error,  the  observation 
being  automatically  recorded,  or  it  may  consist 
of  a  combination  of  errors,  a  human  reading 
error,  the  error  in  sensing,  plus  an  error  in 
one  or  more  analytic  procedures.  It  is  usually 
assumed  to  have  zero  mean  for  each  time  t. 

For  certain  problems,  the  most  important 
source  of  error  is  that  due  to  nonregular  or 
random  behavior  of  the  response  from  time  to 
time.  This  deviation  or  error  is  considered  to 
be  random  over  experiments.  It  arises  when 
the  organism  invokes  some  complicated  control 
to  correct  for  overproduction  or  underproduc¬ 
tion  of  the  measured  quantity.  For  example, 
the  metabolic  rate  of  the  quantity  studied  may 
be  considered  a  constant,  on  the  average. 
The  true  amount  present  at  time  t,  however, 
may  be  W  +  n,  where  n  is  the  process  control 
error.  For  the  same  animal,  in  a  replicated 
experiment,  may  vary  so  that  the  average 
of  i-t,  t  fixed,  over  all  these  re-runs  may  rea¬ 
sonably  be  assumed  to  be  zero.  The  sign  and 
size  of  is  determined  by  a  host  of  factors 
in  the  organism,  so  that  it  is  truly  a  random 
element  at  any  point  t.  Thus,  the  term  «,  will 
be  written  as  the  sum  of  two  elements,  and 
St  :  i-e., 

—  't  +  *t  ‘  (2) 

The  next  question  to  be  considered  is 
whether  the  error  terms  are  time  dependent. 
The  nature  of  the  biologic  process  being  studied 
and  frequency  of  sampling  are  relevant  factors. 
If  the  process  is  continuous  with  measurements 
being  continuously  made,  then  both  v,  and  8t 


will  be  continuous  curves.  In  this  case,  one 
can  imagine  that  the  error  deviations  form 
an  undulating  curve,  weaving  around  the 
average  response  curve.  Thus,  the  deviation 
at  time  t  +  At  will  be  functionally  dependent 
on  the  deviation  at  time  t.  The  commonest 
departure  from  a  continuous  process,  con¬ 
tinuously  observed,  is  obtained  in  situations 
where  sampling  is  at  equally  spaced  time  in¬ 
tervals.  In  what  follows,  At  is  taken  to  be 
greater  than  zero  and  constant.  A  measure 
of  the  time  dependency  between  two  observa¬ 
tions  in  time  is  the  covariance  of  the  two 
observations.  First,  the  covariance  of the 
will  be  discussed.  If  the  sampling  interval,  At , 
is  small,  then  as  observed  above,  i(  and  it+i» 
are  functionally  dependent  and  would  behave 
as  positively  correlated  quantities.  If  At  is 
sufficiently  large,  one  might  observe  a 
zero  correlation  between  ^  and  i-, + a.  ;  an 
overyield  at  time  t  in  no  way  affects  the 
response  at  time  t  -f  At.  If  the  overyield 
at  time;  t  gave  rise  to  an  undercorrection  at 
time  t  -f  At,  a  negative  correlation  would  ob¬ 
tain.  ■  ■  ■  v 

Correlations  among  the  8t  may  be  quite  dif¬ 
ferent  from  those  among  the  y, .  For  example, 
8t  and  8t4.ii  may  be  positively  correlated  for  all 
At.  This  could  obtain  if  an  .  observer  is 
recording  the  response  of  a  system  which 
has  been  showing  a  steady  but  definite  de¬ 
crease  in  time  and  if  the  response  tepds  to 
level  but  or  asymptote,  the  observer  is  likely 
to  remember  the  response  at  a  previous  reading 
and  unconsciously  round  to  effect  a  nonincreas¬ 
ing  response.  This  kind  of  behavior  essentially 
creates  a  moving  average  of  the  8t .  Thus,  for 
a  particular  system  the  covariance  of  y,  and 
n+At  may  be  negative  and  the  covariance  of 
St  and  Si  +  At.  positive.  For  most  systems,  it 
seems  reasonable  to  assume  that  all  y,  are  in¬ 
dependent  of  all  St . 

As  mentioned  earlier,  the  assumptions  about 
the  error  terms  are  made  with  some  particular 
equation  in  mind.  If  the  comments  regarding 
the  t,  are  made  for  equation  l,  the  next  problem 
is  to  study  the  resulting  effect  on  the  error 
terms  when  one  uses  another  representation 
of  the  process.  For  example,  t),  =••  «  exp  ( — /it) 


3 


satisfies  the  equation  i/t  =  <,n),  _  t  ,  where 
..i  is  equal  to  exp  ( —  /f)  and  where  ,/t  —  a  exp 
(  —  /it)  -  E(y,).  Then  the  equation 

y,  “  “'i,  i  +4,  (3) 

is  another  way  of  writing-  equation  I.  For  this 
situation,  it  is  apparent  that  the  assumptions 
made  about «,  are  the  same  as  those  for  i,  since 
«,  and.  4,  are  identical. 

Next,  consider  a  variation  of  equation  3, 
the  first  order  stochastic  difference  equation 
y,  -  »y, ... !  +  t,  ■  (4) 

The  solution  of  4  is  also  equation  1,  but  now 
the  error  term  «,  of  equation  1  is  a  function  of 
in ,  it- ,  .  ■*,  it  .  It  is  instructive  to  study  the 
process,  starting  at  time  zero.  The  time-zero 
reading,  y„ ,  consists  of  a  constant,  a,  say,  plus 
a  random  error  ii,,  or  y„  _=  a  -f  i„,  where  a 
is  the  true  time-zero  value.  For  the  nitrogen- 
washout  study  mentioned  earlier,  a  —  .79,  the 
nitrogen  concentration  of  air  near  sea  level. 
Building  up  equation  1  from  4, 
y,  r-  «y„  +  {,. 

—  “(4  4-  4, ,)  -j-  <r 

=  +  »{„  -(-  ti  - 

y.j  -  »y,-.+  (a 

,=  •»»”  +  w-,  . -Kj 

y,  -  +  2  wt~i{i.  (6) 

Equating  5  with  1,  indicates  that 

I, £  to1-1  {,  5=  ur, ...  i  +  (6) 

With  an  1D(0,.i-)  assumption  (independently 
distributed  with  zero  mean,  variance  o2)  for 
the  ,  then 

(1  -  U'H' 

— 

;  k  0,  1,  2, .  .  .  ,  t,  (7) 

Model  equation  4  is  commonly  referred  to  as 
a  simple  autocorrelation  model  (see  Anderson 
( 1 ) ) .  Note  that  as  t  becomes  large,  the 
variance  approaches  <r(l  —  w2)  and  for  k 
small  relative  to  t,  the  k<"  lag  correlation  co¬ 
efficient  approaches  «,k . 

Instead  of  writ  tig  the  autocorrelation  model 
as  4  or  5,  it  is  frequently  written  as  a  pair  of 
equations, 

.V,  ~  aw1  +  <t 

7  "  (,li-  i  t  t  ■  ( K ) 


If  />  ~  <«,  equation  8  is  equivalent  to  equation 
5;  if  p  —  0,  equation  8  is  equivalent  to  equa¬ 
tion  3.  In  this  more  general  formulation,  one 
can  see  that  if  p  is  negative,  adjacent  observa¬ 
tions  are  negatively  correlated.  This  becomes 
somewhat  clearer  if  one  writes  —p  for  p  in  8. 
Then  the  single  equation  analogous  to  equation 
5  is 

y,  =  an1  +  1)  (  —  p)'  -  1  .  (9) 

If  the  i,  are  ID(0,<r'-),  then 

/  1  _  p2(l-l  +  D  \ 

Cov  ( y j  ,  y,..k)  =  ( —  p)k  c-  I  - — — - — — -  I  , 

k  -  0,  1 . t.  (10) 

As  t  increases  and  k  is  relatively  small,  the  k"' 
lag  correlation  coefficient  very  nearly  becomes 
(—  p)k.  Thus,  all  odd  lag  correlations  are  nega¬ 
tive  and  all  even  are  positive.  This  model  may 
be  appropriate  when  the  system  corrects  itself 
for  overproductions  and  underproductions  as 
the  time  course  proceeds.  Again,  it  should  be 
noted  that  the  observed  lag  correlations  will  be 
as  indicated  only  if  the  sampling  period  coin¬ 
cides  with  the  “correcting”  period.  The  im¬ 
portance  of  the  frequency  of  this  time  sampling 
in  relation  to  the  system’s  assumed  behavior 
cannot  be  over  emphasized. 

A  model  equation  that  is  similar  to  equa¬ 
tions  4  and  8  is  the  moving  average  model, 


For  example,  if 

1  for  i  :-r.  0 

-i>  for  i  —  1 

0  otherwise,  (12) 

then 

fl  —  —  p  4t  ...  I  +  ?l  • 

Assuming  the  4,  are  1D(0,  .r2),  then  using  the 
values  of  m,  from  equation  12, 

l  (1  -t-  p-Jff'2  ,  for  k  0 

Cov  <yt ,  yt  — 1<)  =  .fork  1  (13) 

(  0  .  for  k  -  2 . t. 

From  equation  13,  note  that  the  first  lag  cor¬ 

relation  coefficient  is  ■■/,,'(  1  -|-  p-)  and  all 


-k  i  n  \ 

~)  ’ 


\ 


higher  lag  correlation  coefficients  are  zero. 
This  will  provide  for  a  simpler  kind  of  correc¬ 
tive  action  by  the  system  than  does  equation  9. 

Model  equation  3  may  be  described  as  a 
mechanical  system.  It  is  appropriate  when  the 
response  at  time  t  is  made  up  of  two  parts;  a 
constant  times  the  true  response  at  time  t  —  1 
plus  a  random  error  term.  The  error  term  at 
time  t  is  independent  of  all  other  error  terms. 
Thus,  this  model  may  be  used  when  there  is 
only  an  independent  measurement  error.  Equa¬ 
tions  4,  8,  and  11.  on  the  other  hand,  are  more 
properly  called  feedback  or  historical  models 
for  the  errors  on  previous  occasions  affect  the 
observation  at  time  t.  Most  biologic  .systems 
are  probably  of  this  latter  type,  A  historical 
model  that  allows  for  situations  where  the 
process  control  and  measurement  errors  are 
correlated  in  different  patterns,  and,  thus,  is 
perhaps  more  realistic  than  equation  8,  is 

y(  ==  aw*  -f  —  aa,<  4~  **1  4"  i 

where 

»(  =  —  p  >>t- 1  +  y,  —  z  ( —  /0,-,7|  ( 14 ) 

and  •  • 

k 

'  *,  =  2  n>i  xt-i- 

I'O  •  • 

If  the  measurement  errors  are  independent,, 
then  m„  1  and  all  other  m,  are, zero.  As 
stated  earlier,  if  there  is  no  replication,  then 
Ft  and  s,  are  inseparable.  Even  though  there  is 
no  hope  of  separation  of  the  two,  it  may  be 
helpful  to  imagine  the  errors  as  behaving  in 
this  fashion.  Depending  on  the  relative  size 
of  the  variances  of  i-,  and  8, ,  one  may  be  able  to 
predict  what  the  net  effect  will  be, 

Another  whole  class  of  models  are  those 
where  the  error  is  proportional  to  the  level  of 
the  measured  response.  These  models  have 
utility,  especially  when  the  range  of  y,  is 
several  fold.  Furthermore,  they  are  more 
manageable  under  logarithmic  transformation, 
allowing  for  simple  estimators  for  the 
parameters  of  the  equation.  The  model  equa¬ 
tion  is 

y,  r m-  e>  •  •,  .  (15) 


One  can  take  the  logarithm  of  both  sides 
of  the  equation  and  if  the  e,  are  ID(0,<r2), 
proceed  to  estimate  the  parameters  in  the  usual 
manner  since  the  variance  of  In (y.)  is  a2  and 
the  covariances  are  zero.  To  see  how  different 
autocorrelation  patterns  among  the  e,  may  af¬ 
fect  the  model,  however,  equation  15  is  written 
in  another  form.  For  e,  small,  e't  1  +  et  i 
and  writing,  as  before,  V(  —  ae'  f  ,  then  y,  can 
be  expressed  approximately  as 

y,  —  ne +  V,  e,  .  (16) 

This  equation  may  be  written  as 


y,  —  wn,  . ,  -4-  (IT) 

If  the  c,  in  15  are  ID(0,<r2),  then  the  error 
terms  in  16  are  ID(0, >/}<r).  Equation  17  cor¬ 
responds  to  the  mechanical  system  as  given 
in  equation  3,  with  error  proportional  to  the 
true  response  at  time  t.  The  difference  equa¬ 
tion  analogous  to  equation  17,  corresponding  to 
the  autocorrelation  model  equation  4,  is 


yt,=  +  yt*f,  d8>. 

By  assuming  that  the  process  starts  at  time 
zero  so  that  yu  ~  a  -|-  y„$u;  then  the  solution 
of  equation  18  is 

auJ1 

y  i  = . - •  (1.9  ) 

•id  -  it)  ■  •  ' 


To  a  first  degree  of  approximation, 


1-./, 


.  -1-  || ,  so  that 

1 


rd  +  .(,)  =  1  +  »«,  4  2  x  (,f, 
i<) 


T(l-«,) 

4-  +  *(,. 

Thus,  equation  19  can  be  written  as 
au 4  , 


y«  -  — 


r(l~«,) 


+  V,  (  2  fj)  ■ 
1  ■<» 


Another  way  of  arriving  at  (his  same  form  is 
to  assume  that  the  error  term  in  18  is  propor¬ 
tional  to  the  true  response  at  time  t  rather 
than  to  the  observed  response,  i.e., 


y,  :  -  “J'i  .  i  4-  1,  (,  •  (20) 

Then  (he  solution  is 

t 

V,  aw*  X  t .4  ~  1  ,,, 

(21) 

4  , | ,  (  —  4 , )  . 


r> 


If  the  fi  in  equation  21  are  assumed  to  be 
then  E(y,)  =  and 

COV  (y,  ,  y,_k)  =  (t  -  k  +  1)  «-’  uj-'  -k  <r-  .  (22) 

The  k,u  lag  correlation  coefficient  is  (1  — 
— — ) l/j ;  for  k  small  relative  to  I;  it  approaches 

v  “f*  1 

unity  as  t  becomes  large.  This  result  states 
that  once  the  response  is  far  enough  along  in 
time  and  veers  to  a  particular  side  of  the 
average  response  curve,  then  the  observational 
curve  would  tend  to  stay  on  the  same  side 
for  the  remainder  of  the  experiment,  There 
appears  to  be  no  way  one  can  distinguish  this 
kind  of  anomaly  from  organism  (animal-to- 
animal)  variability,  especially  if  the  experiment 
cannot  be  repeated  on  the  same  organism.  To 
allow  for  continued  recrossing  of  the  average 
response  curve,  negative  correlation  between 
adjacent  error  deviations  may  be  postulated. 
Thus,  model  equations  may  be  written  for  the 
proportional  error  models  that  allow  for  the 
control  of  the  system  similar  to  that  allowed 
in  equations  4  and  11. 

For  model  equation  21,  the  variance  of 
ln(yt)  is  (t  f  l.)<>-  for  an  ID(0,</J)  assumption 
on  the  f, .  If  the  variance  of  In  (y,)  is  not  an 
increasing  function  of  t,  one  may  either 
postulate  ’ 


If  the  range  of  y,  is  small,  the  fitted  equa¬ 
tions  for  the  proportional  error  models  differ 
little  from  their  nonproportional  counterparts; 
for  these  situations  one  might  prefer  to  use 
the  proportional  error  models  for  estimation 
purposes,  since  the  logarithmic  transformation 
linearizes  the  mathematical  expressions.  A 
discussion  of  estimation  problems  is  given  in 
a  later  section. 

4.  INCREASED  COMPLEXITY  OF  THE 
MODEL 

Greater  flexibility  in  the  model  is  obtained 
by  extending  equation  1  to  the  sum  of  two  or 
more  exponentials, 

y,  —  ai  "i  +  n  •  (27) 

For  the  mechanical  model,  the  e,  are  inde¬ 
pendent.;  for  the  autocorrelation  models,  they 
are  functions  of  previous  random  error  terms. 
For  both  types  of  models,  m'"  order  difference 
equations  may  be  written  similar  to  those  equa¬ 
tions  involving  one  exponential  term.  Further¬ 
more,  the  «,  may  be  assumed  to,  be  proportional 
to  the  level  of  the  response  and  with  different 
autocorrelation  patterns  as  discussed  in  the 
previous  section.- 


.  VrS>t_,  +  (23) 

where  p  may.  be  negative,  or  specify  a  moving 
average  relationship  for  the  ft ,  such  as 

fi  =  s  m,  (24) 

Jo.  , 

For  these  two  cases,  the  model  equations  cor¬ 
responding  to  21  may  be  written  as 

yt  =  d,  (l  +  «, )  =  d,  (l  +  i  u1-'  £| )  (26) 

and 


k 

y,  >),  (  1  +  <|  )  —  ij,  (1  +  X  m,  £,  |  )  ,  <2(!) 

respectively.  Note  that;  with  restrictions  23 
and  24  on  the  error  terms,  and  with  the 
ll)(0,.r-')  assumption  of  the  f, ,  the  variances 
and  covariances  of  ln(y,)  will  have  the  same 
pattern  as  for  the  nonproportional  error  models. 
Thus,  for  equation  25,  variance  in (y , )  is  ap¬ 


proximately 


— •  for  l  large,  and  for  equa 


tioii  2(i,  the  variance  is  a'-’Sm-  . 

I 


When  the  number  of  exponential  terms  is 
three  or  more,  there  arises  the  question  of  the 
determination  of  the  number  of  true,  terms  in 
the  model  equation.  For  three  exponential 
terms,  there  are  6  constants'  to.  be  fitted  to  the 
data,  usually  assuring  one  a  reasonable  fit. 
This  will  be  discussed  more  fully  in  the  next 
section,  but  it  does  lead  one  to  consider  whether 
it  is  reasonable  to  postulate  that  w  is  con¬ 
tinuously  distributed.  For  some  biologic  ap¬ 
plications  can  assume  only  a  finite  number  of 
values,  for  others  (see  reference  2,  for  exam¬ 
ple),  a  continuous  distribution  is  quite  attrac¬ 
tive.  An  example  of  this  is  the  nitrogen  washout 
problem  discussed  above ;  one  may  consider  the 


smallest  volumes  v, ,  as  those  for  the  individual 
alveoli.  Recall  that,  the  resting  and  expanded 
volume  of  the  alveoli  are  used  to  determine  the 
.  From  l he  thousands  of  ,  one  can  v  alize 
the  resulting  distribution  formed  by  g  oping 
all  those  that  fall  in  the  interval  1  If 

one  denotes  this  frequency  distribution  by 


f  (<,))d«a,.then  the  expression  for  yi  correspond¬ 
ing  to  equation  27  is 

y,  =  ,fu*  £ (ui)dui  <i  •  (28) 

where  the  e,  in  this  equation  may  be  assumed 
to  be  independent  or  autocorrelated.  Further¬ 
more,.  either  of  these  two  assumptions  on  the 
error  terms  may  be  used  when  the  e,  are  pro¬ 
portional  to  the  response  at  time  t.  The  model 
equation  is  not  specified,  of  course,  until  the 
expression  for  f  («>)  dm  is  named.  In  this  labora¬ 
tory,  the  normal  probability  density  has  been 
used  iorf  (M)dui  in  studying  equation  28. 

Tern  either  equation  27  or  28,  a  term  for  the 
asymptote  may  be  added.  For  some  applica¬ 
tions  considered  in  this  laboratory,  it  has 
seemed  desirable  to  modify  equation  28  by 
multi  jtlying  the  integral  by  a  positive  con¬ 
stant  <  1 . 

5.  ESTIMATION  PROBLEMS 

if  the  variance-covariance  matrix  for  the 
y,  is  known  aside  from  a  constant  multiplier 
<4,  l lien  the  Gauss-Newton  iterative  scheme 
may  laeused  to  estimate  the  parameters  in  the 
mode-1  equations  discussed.  For  the  model 
equations  with  proportional  errors,  which  are 
•  cleiii'J-y.more  manageable  after  logarithmic 
transformation,  a  knowledge  of  the  weights  or 
covatHnncea  for  the  In  (yt)  is  required.  Most 
lechr*ics  that  are  available  for  estimating  the 
;  parameters  either  assume  the  weights  are 
knorv-n  or  implicitly  assume  that  the  yt,  or  the 
,  In  (y<  y,  have  equal  weights.  But  the  co- 
varia-nces  are  generally  not  known  a  priori.  One 
can  estimate  the  autocorrelation  structure,  but 
it  is  doubtful  if  this  approach  is  very  practical; 
it  imay  not  give  good  estimates  for  the 
para 'meters.  R.  L.  Anderson  (1)  indicates  some 
of  lb*  e  problems  of  estimation  when  the  errors 
a  ret-  orrelated.  Also  Zellner  and  Tiao  (3)  show 
thal  the  variances  of  the  estimates  may  be 
very  large  unless  knowledge  of  the  correlations 
art)  employed  in  the  estimation  procedure. 

(  "©wider  the  autocorrelation  model  where 
there  ism  1  exponential  term.  One  possible 
appr  oath  is  to  estimate  the  lag  1  autocorrela¬ 
tion  coefficient  by  one  of  the  standard 
esiimialors.  For  certain  applications,  iA  the 


only  parameter  of  interest.  For  situations 
where  knowledge  of  a  is  also  desired,  one  could 
use  this  estimate  of  w  in  the  equation  y,  = 
um'  +  e,  to  estimate  a.  Thus  if  e,  is  as  given  in 
equation  6,  and  the  A  are  ID  (0,(4),  then  from 
equation  7  one  can  get  the  weighting  functions 
for  the  y, .  In  general,  this  approach  would 
have  limited  usefulness  unless  one  were  quite 
sure  that  m  —  1  and  he  also  had  a  very  good 
estimate  of  <». 

For  m  &  2  exponential  terms,  Cornell  (4) 
has  proposed  a  scheme  whereby  one  groups  the 
data  into  2m  equal  sample-size  categories, 
which  are  nonoverlapping  in  time,  and  for  each 
category  calculates  the  sum  of  the  responses. 
By  using  the  fact  that  each  partial  sum  is  a 
geometric  series  with  an  equal  number  of 
terms,  he  is  able,  through  algebraic  manipula¬ 
tion,  to  estimate  the  2m  parameters.  The 
scheme  appears  to  work  for  well-conditioned 
data.  If  further  refinement  is  desired  in  the 
estimates  of  the  parameters  obtained,  they 
may  be  used  as  initial  or  "  starting  values'  for 
the  more  tedious  Gauss-Newton  .  .  iterative 
scheme.  .  ..  , '  - , •.■  '  „. 

F.or  the  proportional  error  models,  various 
biologic  research  workers  (see,  for  example,  re- 
fererice  5)  have  proposed  a  “peel-off”  method, 
using  a  plot  of 'the  logarithm  of  the  response 
versus  the  metameter  as  a  working  graph. 
This  procedure  implicitly  assumes  equation  15 
or  an  extension  thereof  as  the  basic  model, 
For  m  ="T,  the  resulting. scatter  of  points  lies 
approximately  on  a  straight  line.,  For  n:i  5s  2, 
the  procedure  is  started  by  observing  that  the 
points  at  the  right-hand  end  of  the  scatter 
diagram  lie  approximately  on  a  straight  line., 
To  this  portion  of  the  data,  a  straight  line  is 
fitted  and  extrapolated  back  to  the  response 
axis.  Deviations  are  then  calculated  between 
the  observed  and  predicted  y, .  If  the  logarithm 
of  the  deviations  appear  to  be  linear,  a  line  is 
fitted  to  these  data  and  the  procedure  is 
completed,  with  m  2.  If  the  logarithm  of 
the  second  set  of  deviations  has  a  definite 
curvature,  the  second  line  is  fitted  only  to  the 
straight-line  portion  of  the  deviations,  and  this 
line  is  extrapolated  back  to  the  response  axis, 
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The  process  is  continued  until  all  data  points 
are  fitted. 

The  rationale  for  the  procedure  is  clear 
enough :  as  t  becomes  large,  the  effect  of  the 
smallest  «, ,  i  =  1 ,  2, . . . ,  m  -  - 1 ,  are  washed 
out,  leaving  only  the  effect  of  <■>,„ ,  the  largest  »> 
in  the  far  right-hand  end  of  the  curve.  From 
this  portion  of  the  data  and  the  associated 
or,,,  are  estimated.  One  then  calculates  y,(d,) 
-■  yi  and  plots  ln[y,'(d,)]  versus  t. 

The  next  step  allows  one  to  estimate  <»,„  -  , 
and  «,,,  i  ,  This  scheme  is  followed  until  the 
parameters  of  all  exponential  terms  are 
estimated.  If  the  ■;>,  are  few  (m  no  greater 
than  3,  say)  and  well  separated,  and  further, 
if  each  «,  is  a  reasonable  proportion  of  So, , 
this  procedure  gives  quite  satisfactory  esti¬ 
mates.  This  technic  has  been  programmed 
for  a  digital  computer;  some  random  sampling 
experiments  performed  in  this  laboratory  will, 
be  reported  in.  a  "later  paper.  As  for  other 
similarly  obtained  estimates,  the  values  found 
in  this  manner,  may  be  used  as  initial  or  start¬ 
ing  values  for  the  Gauss-Newton  iterative  pro¬ 
cedure.  As  for  all  methods,  the  assumptions 
about  the  error  terms- must  be  considered.  If 
the  proportional  errors  are  independent,  this 
method  gives  good,  estimates  of.  the  parameters. 

Another  estimation  scheme  for  the 
parameters  in  equation  27  was  given  a  number 
of  years  ago  by  Prony  and  discussed  by 
Whitaker  el  al.  (6).  Prony  formally  treated 
•the  difference. equation 

>v  +  ^i-Vt  i  +  >2y,_2  +  ■  •  •  +  ymy,..n  -  »»  (29> 
as  a  multiple  regression  problem,  considering 
y,  as  the  dependent  variable  and  y, ...  ,  ,  y,  -  a , 

.  . . .  y,  as  m  independent  variables.  The  y, 
may  be  estimated  by  ordinary  least  squares 
and  are,  apart  from  sign,  the  elementary  sym¬ 
metric  functions  of  the  ,  -  i.e.,  y,  —  ~  S  , 
y2  X  etc.  Then  by  using  the 

k  .1 

estimates  of  the  y„  in  a  polynomial  of  degree 
m,  the  m,  can  be  obtained  as  the  m  roots.  They 
can  then  be  substituted  in  equation  27  and  the 
ft.  obtained  by  ordinary  least  squares.  House¬ 
holder  (1)  has  observed  that  the  procedure  has 
two  serious  drawbacks:  .  it  provides  no 

means  for  weighting  the  observations  in  ac¬ 


cordance  with  their  supposed  precision.  Second, 
it  provides  no  criterion  for  determining  the 
number  of  exponentials  required  for  the 
fitting  . .  . .”  Assuming  known  weights  for  the 
y,  ,  he  gives  an  iterative  technic  for  getting 
valid  least  squares  estimates  and  provides,  also, 
a  criterion  for  deciding  on  the  number  of  ex¬ 
ponential  terms  needed  for  an  adequate  fit. 
For  m  —  1,  Prony’s  method  provides  an  esti¬ 
mator  of  y,  =  —i.i,  ,  which  is  equivalent  to  one 
of  the  unsophisticated  estimates  of  the  lag  1 
autocorrelation  coefficient— namely,  5  y,  y,  _  ,/ 

s  y«s'_  i . 

The  estimation  of  the  parameters  of  equa¬ 
tion  ,:2g  is  somewhat  more  difficult,  than  for 
those  of  27.  By  using  the  normal  probability 
density  function  as  the  law  to  describe  the 
distribution  of  the  random  variable  <,>,  the 
problem  is  to  estimate  the  mean  and  variance 
of  this  distribution  from  the  data.  Since  the 
right-hand  side  of  equation  28  gives  the  t"‘ 
moment  of  «  about  the  origin,  one  can  get 
preliminary  estimates  of  /i  and  .r?  by 'using  the  .. 
•method  of  moments  thus,  :  ' 

/  y t  .  ’■  •  ■  _  ."v 

;  .  y2-=  t-  +  a- 

where  ==  stands  for  “is  an  estimate  of.’’ 

These  initial  estimates  can  be  used  in  the 
Gauss-Newton  procedure  to  obtain  more  re¬ 
fined  estimates.  A  requirement  of  knowledge 
of  the  weights  for  the  y,  exists  here  as  in  all 
estimation  problems,  of  course.  The  results  of 
empirical  sampling  studies  using  a  modified, 
version  of  model  equation  28,  with  errors  pro¬ 
portional  to  the  true  response,  will  be  reported 
in  a  subsequent  paper. 

For  most  of  the  Gauss-Newton  estimation 
approaches,  it  has  been  the  experience  of  this 
laboratory  that  considerable  improvement  in 
convergence  as  well  as  speed  of  convergence  is 
obtained  if  one  adds  to  the  Gauss-Newton 
scheme  the  method  of  the  path  of  steepest 
ascent.  The  reader  is  referred  to  D.  W.  Mar- 
quardt’s  discussion  (8)  of  this  modification. 
Yet,  even  this  approach  sometimes  fails  owing 
to  the  high  correlation  of  the  estimates  of  the 
parameters.  In  some  data  the  correlation  is  in 
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excess  of  ±  0.98.  Thus,  convergence  is  not 
always  possible  even  for  the  most  sophisticated 
iterative  procedures. 

6.  THE  NUMBER  OF  EXPONENTIAL 
TERMS 

Determining  the  number  of  exponential 
terms  is  a  serious  problem  when  m  ^  3— i.e„ 
distinguishing  between  m  —  3  and  m  ■  =  4  is 
not  easy  when  the  covariances  of  the  y,  must 
be  estimated  from  the  data.  Carried  one  step 
further,  deciding  whether  one  has  a  finite  but 
large  number  of  terms  as  opposed  to  an  infinite 
number  is  indeed  difficult,  The  problem  is 
especially  complicated  by  the  fact  that  the 
estimates  of  the  parameters  are  highly  cor¬ 
related.  The  situation  is  roughly  analogous  to 
two  other  statistical  procedures:  determining 
the  degree  of  a  polynomial  when  there  is  no 
replication  error  and  determining  the  number 
of  common  factors  in  factor  analysis  studies. 
In  the  latter  situation,  the  “tests”  are  usually 
administered  only  once  and  the  factoring  is  per¬ 
formed  on  a  correlation  matrix;  thus,  there  is 
no  external  estimate  of  error  available  for 
assessing  statistical  significance. 

As  indicated  earlier,  Householder's  sche'me: 
(7)  for  determining  the  number  of  exponential 
term's  requires  knowledge  of  the  proper  weights 
for  the  y,  ;  i.e.,  it  requires  the  known  variance- 
covariance  matrix  of  the  «t ,  aside  from  the 
constant  multiplier  .  Thus,  this  technic  is" 
of  little  use  ip. most  practical  situations.  Wat¬ 
son  (9)  has  discussed  the  problems  of  estimat¬ 
ing  regression  coefficients  when  an  incorrect 
transformation  is  used  on  the  t,  ;  that  is,  when 
assumed  weights  for  the  y,  are  in  error.  He  is 
not  too  hopeful  of  the  approach  of  transforming 
the  e,  to  remove  effect  of  the  autocorrelation 
in  least  squares  analysis  when  the  covariance 
of  the  y,  must  be  estimated  from  the  data. 
Thus,  until  better  approaches  are  found,  the 
data  processor  must  proceed  with  the  curve 
fitting  even  though  the  proper  weights  are  not 
known.  About  all  one  can  do  is  either  assume 
that  the  y,  are  independent  and  have  equal 
variances,  or  use  some  transformation  that  will 
at  least  give  equal  variances  l'or  the  y,  ,  and 
look  at.  the  mean  square  deviation  about  the 


fitted  line,  Corresponding  to  polynomial  curve 
fitting  where  the  statistical  significance  of 
the  coefficient  of  the  last  fitted  term  is  as¬ 
sessed  after  each  step,  one  can  add  exponential 
terms  until  the  mean  square  of  the  residuals 
is  satisfactorily  small.  However,  Ihere  is  no 
test  of  statistical  significance  available  for  de¬ 
termining  the  number  of  exponential  terms 
when  the  weights  of  the  y,  are  unknown.  The 
reader  is  referred  to  a  discussion  by  Siddiqui 
(10)  of  the  problems  of  significance  testing  of 
regression  coefficients  in  linear  models,  when 
the  errors  are  correlated.  Even  if  the  weights 
of  the  y,  are  known,  however,  this  step-by-step 
fitting  scheme  may  not  distinguish  between 
the  m  term. model  27  and  the  continuous  model 
28.  By  using  the  normal  law  for  the  density 
of  the  f(w)dw  .in  model  28,  only  two  constants 
are  fitted  to  the  data;  for  model  27,  2m  con¬ 
stants  are  fitted.  If  m  is  large,  then  in  general 
one  would  expect  a  better  fit  for  model  27  than 
for  the  two-parameter  continuous  models. 
Therefore,  other  information  must  be  brought 
to  bear  on  model  choice ;  the  choice  at  any  point 
in  time  will  be  based  on  intuition  coupled  with 
one’s  understanding  of  the  biologic  process 
under  study,,  ■% 

Results  of  random  sampling  studies  are 
planned  in  this  laboratory  for  models  with 
relatively  small  m,  Attempts  will  .fee  made  to 

fit  m  4-  n,  m  fin . "1,  ...,  m,  m  ~  1,  . . . , 

ni.  it  exponential  terms  to  the.  artificially 
generated  data  for  various  error  patterns.  It 
is  hoped  that  from  these  studies,  some  notion 
can  be  obtained  about  the  number  of  terms 
necessary  for  an  “adequate"  fit  as  well  as  some 
feeling  about  the  effect  of  unequal  weights  for 
the  ,Vi  on  the  estimates  of  the  parameters  when 
the  weights  are  not  assumed  known. 


7.  CONCLUDING  REMARKS 

For  exponential-decay  data,  the  major 
problem  facing  a  curve  fitter  is  in  proper  model 
choice.  This  includes  consideration  of  not  only 
the  proper  mathematical  expression  to  be  fitted 
but  also  rather  intimate  knowledge  of  the 
nature  of  the  error  terms.  These  problems 
have  been  discussed  and  comments  have  been 
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made  about  a  variety  of  models  and  error  as¬ 
sumptions  that  one  may  consider. 

It  has  also  been  emphasized  that  no  really 
good  procedures  are  available  for  curve  fitting 
if  the  variance-covariance  matrix  of  the  error 
terms  is  not  known  or  assumed  to  be  known. 

For  certain  assumed  model  equations,  vari¬ 
ous  methods  of  estimation  of  the  parameters 
are  being  studied  in  this  laboratory.  Under 
investigation  is  the  model  equation  27,  m  ,=  .2 
and  3,  with  the  «,  independent  and  proportional 
to  the  true  response  at  time  t.  the  per¬ 
formance  of  the  peel-off  procedure,  is  under 
study.  The  estimates  from  this  procedure  are 
used  as  starting  values  for  a  modified  Gauss- 


Newton  iterative  scheme.  Also  planned  are 
studies  where  m  is  fixed  and  an  attempt  is 

made  to  fit  m  +  n,  m  +  n  —  1 . m,  m  —  1, 

m  —  2,  ....  m  —  n  terms.  The  effect  of 
sample  size  on  estimating  procedures  is  alsci  to 
be  looked  into.  Finally,  a  modified  version  of 
model  28  is  under  study  and  will  be  fitted  to 
some  empirical  data  once  good  estimating  pro¬ 
cedures  are  developed  for  the  parameters  of 
that  model  equation.  The  results  of  these  in¬ 
vestigations  will  be  presented  in  subsequent 
reports. 

The  computer  for  which  programs  have 
been  and  gre  being  written  is  a  Philco  2000, 
8K  memory. 
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